A Generic Authorship Verification Scheme Based on Equal Error Rates
نویسندگان
چکیده
We present a generic authorship verification scheme for the PAN-2015 identification task. Our scheme uses a two-step training phase on the training corpora. The first phase learns individual feature category parameters as well as decision thresholds based on equal error rates. The second phase builds feature category ensembles which are used for majority vote decisions because ensembles can outperform single feature categories. All feature categories used in our method are very simple to gain multiple advantages: Our method is entirely independent of any external linguistic resources (even word lists), and hence it can easily applied to many languages. Moreover, the classification is very fast due to simple features. Additionally, we make use of parallelization. The evaluation of our scheme on a 40% split (which we did not use for training) of the official PAN2015 training corpus led to an average corpus accuracy of 68.12%; in detail 60% for the Dutch, 67.5% for the English, 60% for the Greek and 85% for the Spanish subcorpus. The overall computation runtime was approximately 27 seconds.
منابع مشابه
A New Formulation for Cost-Sensitive Two Group Support Vector Machine with Multiple Error Rate
Support vector machine (SVM) is a popular classification technique which classifies data using a max-margin separator hyperplane. The normal vector and bias of the mentioned hyperplane is determined by solving a quadratic model implies that SVM training confronts by an optimization problem. Among of the extensions of SVM, cost-sensitive scheme refers to a model with multiple costs which conside...
متن کاملDistractorless Authorship Verification
Authorship verification is the task of, given a document and a candidate author, determining whether or not the document was written by the candidate author. Traditional approaches to authorship verification have revolved around a “candidate author vs. everything else” approach. Thus, perhaps the most important aspect of performing authorship verification on a document is the development of an ...
متن کاملA note on an identity-based ring signature scheme with signer verifiability
Recently, Herranz presented an identity-based ring signature scheme featuring signer verifiability where a signer can prove that he or she is the real signer by releasing an authorship proof. In this paper we show that this scheme is vulnerable to a key recovery attack in which a user’s secret signing key can be efficiently recovered through the use of two known ring signatures and their corres...
متن کاملNon-linear PLDA for i-vector speaker verification
Two approaches are presented for non-linear PLDA to be used in speaker verification. In NIST 2010 speaker recognition evaluation (SRE) tests under DET-5 conditions, the two methods and particularly their combination provided significant improvements in equal error rates and minDCF values over a standard PLDA scheme. The proposed schemes were also applied within a speaker verification system tha...
متن کاملLinguistic Profiling for Authorship Recognition and Verification
A new technique is introduced, linguistic profiling, in which large numbers of counts of linguistic features are used as a text profile, which can then be compared to average profiles for groups of texts. The technique proves to be quite effective for authorship verification and recognition. The best parameter settings yield a False Accept Rate of 8.1% at a False Reject Rate equal to zero for t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015